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Abstract. This paper analyses the effect of preferential sampling in Geostatistics 
when the choice of new sampling locations is the main interest of the researcher. 
A Bayesian criterion based on maximizing utility functions is used. Simulated 
studies are presented and highlight the strong influence of preferential sampling 
in the decisions. The computational complexity is faced by treating the new local 
sampling locations as a model parameter and the optimal choice is then made 
by analysing its posterior distribution. Finally, an application is presented using 
rainfall data collected during spring in Rio de Janeiro. The results showed that 
the optimal design is substantially changed under preferential sampling effects. 
Furthermore, it was possible to identify other interesting aspects related to pref¬ 
erential sampling effects in estimation and prediction in Geostatistics. 
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1 Introduction 

Diggle et al. (2010) presented a novel methodology to perform inference in the tradi¬ 
tional Geostatistical model under preferential sampling. They assumed that the sample 
design could be described by a log-Gaussian Gox process (Mpller et al., 1998) and per¬ 
formed maximum likelihood estimation for the model parameters through simulation. 
In addition, they have made simulations to evaluate the effect of preferential sampling 
on parameter estimation in Geostatistics, concluding that this was not negligible. In all 
simulations, after obtaining unbiased estimates about the model parameters, the spa¬ 
tial prediction was made by the plug-in method, according to the classical approach to 
perform inference in Geostatistics (see Gressie, 1993; Diggle et ah, 1998). 

Based on this work, many others questions have emerged, and one of them was 
related to the influence of preferential sampling in the optimal design choice. This pro¬ 
cedure is widespread in Geostatistics literature. There are several recent papers dealing 
with this, such as Zidek et al. (2000); Fernandez et al. (2005); Zhu and Stein (2005); 
Diggle and Lophaven (2006); Gumprecht et al. (2009); Boukouvalas et al. (2009); Muller 
and Stehh'k (2010), among others. The advances made by Muller (1999); Muller et al. 
(2004) and Muller et al. (2007) that propose methods based on maximization of util¬ 
ity functions are especially relevant. Since these procedures incorporate the Bayesian 
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approach and intensive simulation methods in a natural way, they became quite appro¬ 
priate to perform the optimal design choice in Geostatistics. 

Here the uncertainties about model parameters will be considered allowing us to eval¬ 
uate their impact on spatial prediction (or kriging) under preferential sampling. With 
this aim, spatial prediction about the underlying process are performed by analysing the 
predictive distribution conditional on the observed data and the design sample. Then, 
comparisons between these predictions and those obtained according to the classical 
approach are made. 

However, the most important contribution of this paper is its analysis of preferential 
sampling effects in the process of obtaining the optimal design in Geostatistical mod¬ 
els. Using an approach based on maximizing utility functions (Muller, 1999) to obtain 
optimal design, the influence of preferential sampling is evaluated in situations where 
the researcher’s goal is to optimize some objective function, e.g. reduce predictive vari¬ 
ances. It will be shown through simulations that the optimal decision about this choice 
is substantially modified under a preferential sampling effect. 

This paper is organized as follows: Section 2 presents some background about spa¬ 
tial processes and a methodology for fully Bayesian inference, and spatial prediction 
in Geostatistics under preferential sampling using MGMG methods is described. Sec¬ 
tion 3 presents a method to obtain the optimal design based on maximization of utility 
functions. Section 4 combines this methodology to obtain the optimal design under pref¬ 
erential sampling. In Sections 5 and 6, we use the methodology to obtain the optimal 
design in some simulated examples, and we analyse a real dataset where the researcher 
is interested in monitoring the occurrence of extreme events. Finally, Section 7 discusses 
the results obtained. 


2 Spatial Process Optimal Design 

This section presents some background on Geostatistics and point processes, and pre¬ 
sents a procedure for obtaining the optimal design through utility functions. 

2.1 Geostatistics 

Geostatistics deals with stochastic processes defined in a region D, D C 3?^”, where usu¬ 
ally fc = 1 or 2. Following the approach in Diggle et al. (1998), one can assume that the 
researcher is interested in studying the features of the stochastic process {^(a:) : x G D}. 
Additionally, assume that S{x) is a Gaussian stationary and isotropic process, with 
zero mean, constant variance and autocorrelation function p{S{x),S{x + h);(j)) = 
pdl h ||;())), Vx € U, which may depend on one or more parameters represented by cj). 
Several families of autocorrelation function can be found in the literature (see Cressie, 
1993; Diggle and Ribeiro, 2007). 

Assuming that n observations Yi = Y{xi), i = 1,..., n are available and 

Yi = g + S{xi) + Zj, 

E[Zi\ = 0,Var[Z,] = , Vf, 
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one can consider Y = {Yi,... ,Yn) as a noisy version of the underlying process S. As 
usual, it will be assumed that Z = (Zi,..., Zn) has a Gaussian distribution independent 
of the process S. 

Under the Bayesian paradigm, the posterior distribution of all model parameters 
must be obtained to make inference. Assuming that 6 = (r^, (j), ji) represents the set 

of unknown parameters and that p(y | 6) is the likelihood function, we need to specify a 
prior distribution piO) to obtain the posterior distribution p{9 | y). It is usual to assign 
Gamma distributions to the parameters </>, a~‘^ and and a Gaussian distribution 
to p,. Since the posterior distributions obtained have no closed form, we can approximate 
them using MGMG methods. 

Usually there is an additional interest in obtaining predictions of the process S at 
locations not observed. Computations required in inference are usually implemented 
via discretization of the region D in M subregions or cells. Thus, we can redefine the 
underlying Gaussian process over the centroids of these subregions, S = {S*!,..., Sm}- 
In addition, we define the partition S = {S'y, S'tv} to distinguish the underlying process 
associated with the n observed and the N = M — n unobserved locations. The distri¬ 
bution of S' is a multivariate Gaussian distribution with dimension M = n + mean 
vector 0 and autocorrelation matrix given by 

n f Rn,N j 

^ “ V Rn j ’ 

whose elements are defined by the autocorrelation function p{ -; (^). Then, we have the 
model 

[y\Sy,p,T-‘^]'^N{lp + Sy,T‘^In), (1) 

[S\a-\4>]-N{0,a^R) (2) 

that is completed with the priors p ^ N{0,k), ^ G{ar,br) and cr“^ ^ G{aa,b„), 

where fc, ar^br, and b^- are known hyperparameters. 

Without loss of generality with respect to the objectives of this work, the exponential 
autocorrelation function p{\\ h ||; </>) = exp(— || h || /</>) will be used in the sequel. 

The full conditional distributions for the parameters p,T~'^, cr~^, and S can be 
updated by Gibbs sampling steps, whereas the range parameter (j) can be updated by 
Metropolis steps in the MGMG algorithm. The full conditional distribution for S is of 
particular interest in Geostatistics and has Gaussian distribution with mean vector and 
covariance matrix given by 

<j‘^RN,n{T‘^In + 0-‘^Rn)~^{y “ ^p) 

and 

(t'^Rn - Rn, n(T'^In + a'^Rn)~^o-‘^Rn,N- 

Expressions above are known as kriging predictor and kriging variance, respectively. 
There are extensions of the basic model of Geostatistics, most of them developed to deal 
with non-stationarity (Higdon et al. (1999); Fuentes and Smith (2001); Fuentes (2002) 
and Bornn et al. (2012)) or non-Gaussianity (Diggle et ah, 1998). 
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2.2 Spatial Point Process 

The use of point processes for modelling patterns of points in space has been intensified 
in the last decades, specially after the publication of the classic texts of Ripley (2005) 
and Diggle (1983). The study of point processes has also evolved, based in recent com¬ 
putational advances. Mpller and Waagepetersen (2007) present an excellent review of 
the methods and models to spatial point processes and highlight several applications 
and computational aspects related to inference. 

A spatial point process X can be understood as a random finite subset of event 
locations belonging to a certain limited region H C where usually k = 1,2 or 3. 
A spatial point process, defined in a region H C governed by a non-negative random 
function A = {A(a;) : x S 5R^}, is a Cox process if the conditional distribution of 
[X I A = A] is a Poisson process with intensity function A(a;). Additionally, if one can 
assume that A(a;) = exp{Z(a;)}, where Z = {Z{x) £ 5R^} is a stationary and isotropic 
Gaussian process, then it is said to be log-Gaussian Cox process (Mpller et ah, 1998). 

The likelihood function associated with this process is given by 



Although this function is not analytically tractable, inference on a log-Gaussian Cox 
process can be performed in a reasonably simple way through Monte Carlo simulation 
methods (Mpller et ah, 1998). Again, it is usual to represent the domain D of the point 
process as a grid and approximate the Gaussian process Z{x) by a finite-dimensional 
normal distribution defined on the grid. Waagepetersen (2004) showed that the dis¬ 
cretized posterior for log-Gaussian Cox process converges to the exact posterior when 
the sizes of the grid cells tend to zero. According to Mpller et al. (1998), if the point 
process is reasonably aggregated and has moderate intensity, the choice of the grid does 
not need to be fine to produce good results in inference. The inference conducted un¬ 
der the Bayesian paradigm, using MCMC methods, makes it relatively easy to obtain 
approximations of p(x | Z). On the other hand, the computational cost may be high 
(Mpller and Waagepetersen, 2007). 

2.3 Geostatistics under Preferential Sampling 

In the Geostatistics literature, it is common to consider the sample points x as fixed or, 
if coming from a stochastic process, independent of the process S{x). When the sample 
design is stochastic, we must specify the joint distribution of [F, S, X]. We have a process 
under preferential sampling if [S', A] ^ [S][A], i.e. the sampling design is dependent of 
the spatial process. 

The class of models proposed by Diggle et al. (2010) to accommodate preferential 
sampling effect assumes that, conditional on S, A is a log-Gaussian Cox process with 
intensity A(a;) = exp{a -I- /3S(a;)}. In addition, conditional on [S, A], we have that 
Yi ^ N[g S{xi),T‘^], i = l,...,n. Diggle et al. (2010) presents a way to evaluate 
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this distribution through a fine discretization of the region D. The region D can be 
discretized into M cells with centroids Xi,i = 1,..., M, where only one point is expected 
in each cell. 

In a Bayesian approach, we need to obtain the posterior distributions to make in¬ 
ference about the model parameters. For this purpose we assign Gaussian priors to a 
and /3. Pati et al. (2011) proved that the use of improper priors for the parameter that 
controls the preferential sampling effects produces proper posteriors. Thus, the data 
provide enough information to perform inference for this model parameter, even under 
vague prior information. 

Since the posterior distributions of these parameters have no closed form, we use 
an approximation making use of MCMC methods in a discretized version of the model 
given by 


a + f3S{xi))]'^' exp 

i=l 

where rii and represent the counts and the volume of the *th subregion, i = 1,, M. 

The full conditional distributions of the parameters /i, (j), and are given 
by the same expressions obtained in the case of non-preferential sampling. The full 
conditional distributions of /3 and a are updated in a similar way. The full conditional 
distribution of S is updated in MCMC by Metropolis steps (see details in Appendix A). 


M 

p(x I S, a, /3) oc n[exp( 



3 Optimal Design 

In general, finding an optimal design involves procedures for obtaining the maximum 
or minimum of an objective function. These objective functions usually quantify the 
gains and losses related to each possible decision. In this case, we need to decide which 
locations (in time or space) will be collected in order to better understand certain char¬ 
acteristics of the phenomenon. When the phenomenon of interest is studied assuming 
that it is governed by an underlying stochastic process, the methodology for obtaining 
the optimal design is usually performed via Decision Theory. For more details related 
to Decision Theory, see the classical textbook of DeCroot (2005). 

According to Muller (1999), the procedure for obtaining the optimal design can be 
performed by defining a utility function rt(d, 0, yd), where d = (di,..., dm)' represents 
the m new sample locations, di € D, and yu = (y^i, •. ■ ,ydm)' is the vector of future 
observations arising from it, i = 1,..., m. After a set of observations y is available, the 
optimal design is the vector d* that maximizes the function 

U{d)= J it(d,6»,yd)pd(yd I 0,y)p{O \ y)d6»dyd = Ae,y^|y[u(d, 0, yd)]. 

In other approaches, one can include additional information to obtain the optimal de¬ 
sign. In order to achieve the optimal design in time, Stroud et al. (2001) used covariates 
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to reduce the model parameter uncertainties, thereby obtaining the more appropriate 
point in time for the return of a patient undergoing treatment. On the other hand, Ding 
et al. (2008) used a hierarchical model to relate the effects of different treatments in 
clinical studies to determine the optimal design associated with a specific treatment. 

An interesting strategy to optimize the expected utility is known as an augmented 
model (Miiller, 1999). In this case, the optimal design point is considered as a parameter 
and can be estimated through its posterior distribution. Thus, an artificial probability 
model h(d, 9, yd) is defined assuming that D is bounded and u(d, 9, yd) is non-negative 
and limited. The distribution h(d,0,yd) is given by 

/i(d,6',yd) cx M(d,6>,yd)pd(yd | 9)p{9). 

Under this assumptions, the marginal distribution of d is proportional to 

h{d) oc J u{d, 9, yd)pd{yd \ 9)p{9)d9dyd = U{d). 

Therefore, finding the mode of h(d) becomes equivalent to maximizing the expected 
utility of d. Since the sampled values of d in the MCMC algorithm will be concentrated 
near regions of high expected utility, less time is consumed in the simulation procedure. 
In other words, one only needs to find the mode of the distribution, instead of dealing 
with an optimization problem. 

It is important to remark that this methodology can be naturally performed in 
situations where one wants to choose m > I new design points. The main difficulty in 
this case is to face the increased computational cost involved since the evaluation of u(d) 
is required at each iteration in MCMC. In addition, a large sample of d = (di,..., dm)' 
would be needed to evaluate h{d) properly. Alternatively, it is also possible to obtain the 
m new points sequentially. However, in this case, there is no guarantee that the resulting 
design will be optimal. The evaluation of the utility function also allows removing points 
instead of adding new points. In this case, the decision would be based on gains and 
losses in the utility after eliminating each site or group of sites. 

3.1 Utility Functions 

Commonly in Geostatistics, the researcher focuses more on spatial prediction than on 
inference about its parameters. In these cases, the choice of a utility function that 
depends on the predicted values and their variances is a natural choice. In this sense, 
the prediction variance associated with the location x € D, V{S{x) \ y) gives to the 
researcher the degree of uncertainty about his/her predictions after observing the data y. 
Thus, reducing this variance throughout the region D constitutes a reasonable criterion 
for choosing a new sample location d, d G D. Based on this principle, the following 
utility function is a natural choice for implementing the approach described in Muller 
(1999) in the context of Geostatistics: 

M(d,6»,yd) = J [V(S'(a:) | 9,y) -V{S{x) \ 9,y,yd)]dx, 


(3) 
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which can be interpreted as the gain obtained in reduction of prediction uncertainty 
after observing yj. Thus we need to maximize its expectation given by i7(d). 

Finaiiy, it is aiso important to note that the predictive variances used in the utiiity 
function it(d, 0,yd) depend oniy on the iocation of yu in D, instead of its vaiue. This 
feature of the predictive variance avoids the necessity of evaluating all possible values of 
yd while maximizing C/(d), thus reducing the computational cost involved. More details 
about the evaluation of the utility function (3) are presented in Appendix B. 

Other utility functions could be chosen. In Section 6, we employed a utility function 
that is higher for locations where extreme values or exceedances are expected. In this 
case, we have 

u{d, 9, yd) = P[|S'(a;d)| > xo\9, y] (4) 

where Xd represents the location associated with yd and xq is an extreme threshold. In 
practical situations, this expression could be improved by considering costs and risks 
related to the occurrences. 


4 Optimal Design under Preferential Sampling 

The problem of obtaining the optimal design d* for spatial processes under preferential 
sampling can be performed based on optimization of 

U(d) = Eff^y^i^^y[u(d,9,yd)] = J u(d, 6», yd)p(yd | 6»,x,y)p(6» | x,y)d6»dyd 

where p{9 \ x, y) is obtained in Section 2. Given the posterior distribution of 9 and the 
utility function, we can achieve the optimal design by seeking the mode of the posterior 
pseudo-distribution of d. The greatest difficulty in this step is evaluate the effects of 
preferential sampling in the utility function u(d, 0, yd). In many cases, like the situations 
presented in this paper, the evaluation of this function becomes challenging. 

As will be noted in the next section, preferential sampling directly impacts on the 
estimation of the mean p of the underlying Gaussian process. If the utility function 
M(d,0,yd) depends on this parameter, the choice of optimal design will be greatly 
affected. On the other hand, it would be expected that preferential sampling will also 
affects utility functions defined to quantify reductions of uncertainty related to each 
choice. This actually happens since the spatial configuration of the sample points also 
provides information about the underlying process. 

Using as an example the utility function defined in Section 3, one would need to 
include the information provided by the observed point process x, i.e. 

u(d,6»,yd) = J [V{S{x) I 6»,y,x) -V{S{x) \ 6», y, x, yd)]da;, 

and one would still need to know the variance of the distribution of [S \ 0,y,x]. Sam¬ 
ples of this distribution are easily obtained during the implementation of MCMC, as 
described in Section 2, but we cannot directly obtain estimates of this variance at each 
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iteration of the algorithm. To deal with this difficulty, one must resort to an approxima¬ 
tion. A sampling-based approximation would require an additional MCMC sub-chain 
at each iteration, thus increasing substantially the already high computational cost. An 
analytic, cheaper alternative is to use a Gaussian approximation of this distribution in 
order to evaluate the variances required in the utility function (3). Note that this ap¬ 
proximation is only used to evaluate M(d,0,yd). More detail about this approximation 
are presented in Appendix C. On the other hand, if the utility function (4) is used, we 
can directly evaluate U{d) from [S \ y,x], since a sample of this distribution will be 
available after performing the MCMC. 


5 Simulation Study 

In this section, we simulate datasets according the model presented in Section 2.3 to 
illustrate the effects of preferential sampling on inference and optimal design choice. In 
all cases, we chose a set of parameters which do not produce very large samples, in order 
to enhance the effects of preferential sampling. We also consider situations where one 
needs to add m = 1 and m = 2 new locations to the sample design. In order to produce 
the observations yi,..., j/n, we first generate a surface S{x), according the Geostatistical 
model presented in Section 2.1, over the discretized region D. Conditioning on S, we 
then generate a point process from the log-Cox Gaussian model presented in Section 2.2 
in order to obtain n observations. We considered five simulated studies: 

Case I. Simulation considering a one-dimensional region D = [0,100], partitioned into 
M = 100 sub-regions, where (a; /3; /r; 4>; r'^^n) = (—3; 2; 12; 2; 20; 0.1; 18). 

Case II. Simulation considering a one-dimensional region D = [0,100], partitioned into 
M = 200 sub-regions, where (a; /3; y; cr^; </>; T^;n) = (—3.5; 3; 12; 1; 20; 0.01; 123). 

Case III. Simulation considering a one-dimensional region D = [0,200], partitioned 
into M = 200 sub-regions, where (a;/3; y; cr^; r^; n) = (—1.5; 0.5; 12; 1; 20; 
0.01; 56). 

Case IV. Simulation considering a two-dimensional region D = [0,100], partitioned into 
M = 225 sub-regions, where (a; /3; g; T‘^;n) = (—8; 2; 12; 2; 20; 0.1; 12). 

Case V. Exactly as Case IV but with M = 400 sub-regions. 

Case V was considered in order to evaluate the discretization effect. Figure 1 shows 
the Gaussian processes S simulated for Cases I-IV. Since /3 > 0 in all simulations, the 
observations are concentrated near the sites where the process S showed higher values. 

5.1 Inference and Prediction 

Posterior inference about model parameters was performed. Inference was also per¬ 
formed without considering preferential sampling, i.e. using the Geostatistical model 
of Section 2.1 to allow comparison. Prior distributions p ^ a ^ /3 ~ Af(0;10^), 
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(a) Case I 


X 

(b) Case II 




(c) Case III (d) Case IV 

Figure 1: Realization of the one-dimensional (Cases I, II and III) and the two- 
dimensional (Case IV) simulated processes with respective observed values (circles). 
Except for (c), the observations are concentrated near the sites where the process S 
shows higher values. 


T~^ ~ ^ 0(2; 0.5) and (p ~ G(2;0.05) were used in all cases. Furthermore, in 

Cases /-///, 500,000 iterations were generated in the MCMC algorithm and only the 
last 100,000 were used to compose the posterior distribution samples of the model 
parameters. For the other cases, 400,000 iterations were generated and only the last 
50,000 were considered. The convergence of the chains was assessed by visual inspection 
of several chains generated from different initial values. 































720 


Optimal Design in Geostatistics under Preferential Sampling 


Considering the effect of preferential sampling, the posterior distributions of each 
model parameters are concentrated around the true values of the model’s parameters, 
except for the in Case I. However, this parameter was also underestimated consid¬ 
ering a non-preferential model. Figure 2 shows that the variogram estimated assuming 
the preferential sampling effects is slightly closer to the true variogram for the Cases I 
and V. Figure 2 also shows that the variograms estimated by the preferential model 
are closer than those estimated by the non-preferential model in Cases II and IV (Fig¬ 
ure 2(a),(b),(e), and (f)). Finally, in Case ///, since /3 is small, the estimated variograms 
are very similar (also in Figure 2). 

In particular, in Case IV, the posterior distributions are not concentrated around 
the true values for all parameters of the model. This difficulty is partly justified by 
the small sample size. However, the results obtained by the model with preferential 
sampling are more satisfactory. Besides a better estimated variogram, this also can be 
observed through the posterior distributions of /i, shown in Figure 3. 

Figure 4 shows the predicted values of S, represented by the median of a posterior S, 
and the respective 95% credibility intervals for each model in Cases I and II. 

Analysing Figure 4(a)-(b), we can see that only the credible intervals which consider 
the effect of preferential sampling encompass the most extreme values of the simulated 
process S. Additionally, it can be seen that the point estimates of S in the regions where 
the process was hardly observed are better with the model with preferential sampling. 
The differences are even more pronounced when we analyse the results for Case II, where 
the intervals are narrower for the preferential model (Figure 4(c)-(d)). In addition, the 
non-preferential model underestimates the process S. 

Since the estimated variograms obtained by both models were similar in Cases I 
and V, it seems reasonable to conclude that the differences observed in prediction for 
these cases were caused mostly due to the differences between the predictive distribu¬ 
tions [S' I y] and [S | y, x]. 

Finally, Figure 5 shows the predicted surfaces of S -I- /r, represented by the posterior 
means, obtained for each model in Case IV. It can be observed that only the preferential 
model can identify regions where the underlying process presents low values. This feature 
avoid predictions concentrated around the mean, like those obtained by the traditional 
kriging methods. In addition, we can conclude that the small sample size intensifies the 
effects of preferential sampling in prediction. 

All simulations presented in this section reflect situations where there are few ob¬ 
served sample points. This is not uncommon in practice, especially in monitoring studies 
of environmental or climatic phenomena which are rare or difficult to detect. However, 
even under these conditions, the use of models assuming the existence of preferential 
sampling effects has produced variogram and kriging surfaces generally closer to the 
true values when compared to the estimates produced without considering this effect. 
Another major advantage of using these models is the correction that is made dur¬ 
ing the inference about the underlying process mean. Finally, we showed the ability of 
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distance 

(e) Case IV - preferential 



distance 

(f) Case IV - non-preferential 


Figure 2: Posterior medians (red line) and respective 95% credibility intervals (dashed 
lines) of the variogram obtained by the preferential model (left) and by the non- 
preferential model (right) for Cases II, III and IV. The circles represent the empirical 
variogram and the black line represents the true variogram. 
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Figure 3: Posterior distributions of g under preferential sampling (left) and without 
considering this effect (right) for Case IV (red points represent the true values of the 
parameters). 


this model to identify areas where the underlying process takes extreme values, even in 
situations where there are no samples nearby. 

Diggle et al. (2010) evaluated the influence of preferential sampling influence in the 
predictions of Gaussian processes using [iSly, 9\ after correcting the bias caused by pref¬ 
erential sampling. However, these predictions did not take into account the information 
provided by x. In the context of environmental monitoring, Shaddick and Zidek (2014) 
also presented a methodology for correcting the bias caused by a selective reduction of 
sample sites. In contrast, the Bayesian approach provides samples directly of the distri¬ 
bution of interest [5'|y,x]. The comparison between the predictions assuming that 9 is 
known reinforced the conclusion that methods based on corrections of the variogram’s 
bias are not sufficient to reproduce the true uncertainty associated with the predictive 
distribution of the underlying process S. 

Performing different simulated scenarios of prediction under preferential sampling ef¬ 
fects, Gelfand et al. (2012) also concluded that they affect more significantly the spatial 
prediction than the estimation of the model parameters. In their paper, they discussed 
ways to evaluate the effects of preferential sampling by comparing two predicted sur¬ 
faces. One of the forms of global comparison they mentioned is associated with the local 
and the global mean squared prediction error. The Local Prediction Error associated 
with xo, denoted LPE{xq), is given by 

LPE{xo) = i^[^(a;o) - ^'(xo)]^, 

where S{xo) is the predictor of S in Xq. The Global Prediction Error is given by 

GPE = 7^7 / LPE{x)dx. 

\D\ Jd 

Table 1 presents the GPE values for each of the simulations. Based on this table, one 
can observe a significant reduction obtained by considering the effects of preferential 
sampling. 
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X 

(a) Case I - preferential 


(b) Case I - non-preferential 




(c) Case II - preferential (d) Case II - non-preferential 

Figure 4: Simulated process (solid line), posterior median of the predictive distribution 
(red line) and the respective 95% credibility intervals (dashed lines) for S obtained by 
considering (left) and by not considering (right) the effect of preferential sampling in 
Cases I and II. 


The one-dimensional simulation showed that the LPEs are reduced when they refer 
to the locations where the underlying process S has lower values. Similar conclusions 
were obtained in the second simulation, since errors remain smaller in regions where the 
magnitude of S is lower when the preferential sampling effect is taken into account in 
the modelling. Further simulations without the effect of preferential sampling were also 
performed. In such cases, in general, the model assuming preferential sampling produced 
posterior distributions of P centred at zero. 
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Figure 5: Posterior mean of the predictive distribution of [S' + p] obtained considering 
(left) and without considering (right) the effect of preferential sampling for Case IV. 


Simulation 

No preferential sampling 

Preferential sampling 

Case I 

0.9496 

0.7301 

Case II 

0.5533 

0.1783 

Case III 

0.5286 

0.4512 

Case IV 

2.1336 

1.6789 

Case V 

1.6214 

1.3482 


Table 1: Global Prediction Error (GPE) for each of the simulations. 


5.2 Optimal Design 

Using the one-dimensional simulated data presented in Case /, 83 auxiliary points, 
required for evaluation of the utility function (3) described in Section 3, were also used 
to form a grid and obtaining the predictive variance reductions. Using the samples of 9, 
which were obtained from the posterior distribution in the MCMC algorithm, we can 
generate samples of d as mentioned in Section 3 to obtain a new optimal sample point 
for the case where m = 1. 

Figure 6 shows the histograms of the posterior pseudo-distribution of d with and 
without the preferential sampling assumption, respectively. It can be noted that the op¬ 
timal design choice without preferential sampling leads the researcher to select locations 
where there are no nearby points, i.e. in the interval [5,35]. On the other hand, under 
preferential sampling, the results lead the researcher to a different direction. In this case, 
except in the subregions where there are several observed samples, the other choices have 
similar expected utilities. In summary, under preferential sampling, the optimal design 
choice is notably changed when the researcher wants to reduce the predictive variance. 

To illustrate the effect of the preferential sampling where m > 1, we obtained the 
pseudo-distribution of d = {di,d 2 )' assuming that two new locations would need to be 
added in the sample design. The results are presented in Figure 7. Analysing the results 
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Figure 6: Posterior pseudo-distributions of d under the effect of preferential sampling 
(left) and without considering this effect (right) for Case I. 


for the non-preferential model, it can be seen that the optimal solution could be to 
add one point di from interval [0, 20] and to make no restrictions to the location of the 
other point d 2 , since h(d) does not vary much. On the other hand, the results for the 
preferential model indicate that the optimal design should include one point di from 
interval [0,4] and another point ^2 from interval [88,92]. 



0 20 40 60 80 100 


Figure 7: Posterior pseudo-distributions of 
sampling (left) and without considering thi 



^^^ 

0 20 40 60 80 100 


= (di, ^ 2 )^ under the effect of preferential 
effect (right) for Case I. 


In the two-dimensional Case IV, 900 auxiliary points were also used to form a grid 
and obtaining the predictive variance reductions, as in the previous section. Figure 8 
shows the posterior pseudo-distributions of d under preferential sampling and without 
considering this effect for the case where m = 1. 

It can be noted in Figure 8 that the areas with the highest expected utility are not the 
same for the two models. As expected, the results obtained without considering the effect 
of preferential sampling makes the researcher choose locations far from the observed 
points. In contrast, under preferential sampling, the largest utilities expected are more 
dispersed over region D. Again, under preferential sampling, the optimal design choice 
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Figure 8: Posterior pseudo-distribution of d under the effect of preferential sampling 
(left) and without considering this effect (right) for Case IV. The densities of these 
pseudo-distribution are multiplied by 100 for better visualization. 


was changed, since the regions with few sample sites also provide useful information 
about S. 

The two simulated studies have involved situations where the inference step has 
produced similar results (in the one-dimensional case) and different results (in the two- 
dimensional case). Surprisingly, even in the one-dimensional case, where the estimated 
variograms were similar, the process of choosing the optimal design led to quite different 
results. 

However, the use of a different utility function can produce even more extreme 
results. Even though the utility function chosen in this paper was designed for our Geo¬ 
statistics goals, other functions could be considered. According to the results obtained 
from the simulated studies, utility functions that depend on the underlying process p 
may also be affected by the preferential sampling effect. This situation will be explored 
in next section. There are other effects that may affect the results, such as the choice 
of the auxiliary grid (used to evaluate the predictive variance reduction) and the dis¬ 
cretization level of D. Current computational costs associated with this methodology 
are still a barrier for a more thorough evaluation of the degree of influence of each of 
these marginal effects. 

Effectiveness of the Optimal Decision 

After obtaining the optimal design, one can evaluate if the results are better under 
preferential sampling. Thus, we performed an analysis of the GPE after this decision 
to the two-dimensional simulated data Case IV. The coordinates of the optimal design 
was Xd = (90.00; 76.66), under preferential sampling, and Xd = (30.00; 50.00) without 
this effect. In addiction, it was assumed that = 0. 

Finally, we proceed to the inference via classical Geostatistics methods but including 
the optimal data point Yd obtained under preferential sampling. The results provided 
a GPE = 1.8392, which is lower than that obtained using the optimal data point Yd 
pointed out by the non-preferential model {GPE = 2.3348). Thus, the optimal design 
obtaining under preferential sampling was more advantageous even when the inference 
is performed via classical Geostatistics methods. 
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6 Case Study: Rainfall Data in Rio de Janeiro 


The methodology is now applied to a real scenario in the context of monitoring net¬ 
works. More specihcally, we will analyse pluviometric precipitation data obtained from 
32 monitoring stations located in the city of Rio de Janeiro, Brazil. The data refer to 
the period from 1 to 31 October 2005 and were obtained from the Pereira Passos Insti¬ 
tute, an official agency associated with the local government. The rainfall during this 
month, which begins the rainy season in the Brazil’s Southeast, is of particular interest 
to meteorologists and government agencies (Alves et ah, 2005). Figure 9 shows the map 
of the Rio de Janeiro city with the respective precipitation levels observed. 



Figure 9: Pluviometric precipitation in Rio de Janeiro city in October/2005 (separated 
according the 0.20, 0.40, 0.60, and 0.80 quantiles and grouped by the colours: blue, 
green, yellow, brown, and red, respectively). 

Analysis of the spatial distribution of the precipitation seems to indicate that the 
stations are more concentrated near places where rainfall level is higher. Even though 
the geography and the spatial distribution of economic activities could be considered as 
possible causes of this sample design, the methodology for choosing a new design point 
under the effect of preferential sampling can be employed here. 

For inference, we used the same priors of the previous simulations (by changing 
some hyperparameters) and the study area was partitioned into M = 332 subregions. 
We monitored 100,000 iterations in the MCMC algorithm for both models and the 
first 10,000 were considered as burn-in. The convergence of the chains was assessed 
by visual inspection of several chains generated from different initial values. Table 2 
presents summaries of the posterior distributions for all model parameters. An analysis 
of the results suggests that the effect of preferential sampling is significant, indicating 
a positive association between the sample design and the rainfall intensity. 

As for the other model parameters, we observed differences between the estimates 
for the mean /r, which can be explained by the presence of preferential sampling effects. 
The utility function (4) was used to obtaining the optimal design, with xq = 200 mm. 
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Model parameters 

Preferential 

Non-Preferential 


1.25 (0.49; 2.90) 

1.32 (0.51; 3.45) 

a" 

4289.92 (2096.31; 10528.91) 

4132.72 (2120.30; 8514.38) 


104.84 (97.73; 110.60) 

119.88 (111.49; 130.32) 


10.69 (4.27; 26.75) 

10.43 (4.51; 22.76) 

a 

-3.84 (-4.24; -3.48) 

— 

p 

0.008 (0.002; 0.014) 

— 


Table 2: Posterior mean and 95% credibility interval (in parentheses) for model param¬ 
eters. 



(b) Non-Preferential 

Figure 10: Expected utilities U{d) obtained under preferential sampling (a) and without 
considering this effect (b) for the Rainfall data. 
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This utility function assigns greater utility for regions where it is most likely to observe 
a monthly rainfall above 200 mm. The expected utility U {d) obtained for each model is 
shown in Figure 10. 

The preferential model has concentrated high expected utility into a small region in 
map since the utility function favours regions with the highest probability of extremes 
values and due to the overestimation of On the other hand, the non-preferential model 
has produced expected utilities more spread out over the southern part of the city. 


7 Discussion 

The results obtained in Sections 5 and 6 help us to understand the effects of preferential 
sampling on inference and optimal design choice in the context of Geostatistics. The 
results of Section 5 showed that the bias corrections made during the estimation step 
are not enough to ensure a satisfactory prediction. On the other hand, the Bayesian 
approach presented here allows us to make predictions about any functional of S directly 
from [S I y, x], which is the correct predictive distribution under the effect of preferential 
sampling. Furthermore, the inference about the parameters that define the effect of 
preferential sampling, i.e. a and /?, was quite satisfactory in all simulations. Despite 
the high associated computational cost, the approach is computationally feasible and 
showed satisfactory results. 

In practical situations, it may be unlikely to assume that the design is governed 
by a log-Gaussian Cox process. However, this model seems to be flexible to obtain 
understanding about the consequences if the researcher has no covariates or a better 
explanation for its true causes. Although not applicable in the strictly spatial context, 
an alternative approach to detect and deal with preferential sampling is spatio-temporal 
analysis, since the changes on site locations over time may be informative about the 
association between its configuration and the underlying process. Zidek et al. (2014) 
present a method that can learn about the preferential selection process over time and 
that allows the researcher to deal with its effects. 

Traditional approaches to deal with preferential sampling also include the fit of 
trend surfaces to assess first order effects and, in the context of survey sampling meth¬ 
ods, weighting schemes. However, both require some prior knowledge about the sample 
selection processes. The researcher must carefully evaluate the (theoretical or empir¬ 
ical) evidence of preferential sampling effects. The use of this approach can produce 
misleading results otherwise. If there is evidence of preferential sampling, the authors 
believe the use of models based on latent processes, such as log-Gaussian Gox process, 
should be used even when the dependence structure between X and S' is not completely 
known. 

As evidenced by simulations, the effects of preferential sampling on optimal design 
choice cannot be disregarded. The knowledge about the sampling pattern reduces the 
predictive variance of S in areas poorly sampled, substantially changing the optimal 
decision. On the other hand, a utility function based on exceedances seems to over¬ 
estimate /r under preferential sampling. As a result, the ideal region to receive a new 
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sample becomes very small. However, obtaining the optimal design in situations where 
there is a need to monitor extreme events or exceedances may be not simple. Chang 
et al. (2007) presented an approach to deal with some challenges arising in designing 
networks for monitoring fields of extremes, as the loss of spatial dependence and the 
limitations of conventional approaches. 

Potential areas to apply this methodology include the study of phenomena scarcely 
observed and those in which, due to researcher’s interest or limited resources, can only be 
observed in locations considered critical. One can include the monitoring of mosquitoes 
or other disease spreading pests that only become detectable in places where its oc¬ 
currence is very high among the phenomena that are scarcely observed and difficult to 
be detected. Another potential application is the monitoring of maximum and minimal 
temperatures in regions near airports or industrial plants. In both cases, it seems reason¬ 
able to infer that the way the phenomenon is observed can be related to the underlying 
process. In this situation, the optimal design choice can change significantly, since the 
spatial point pattern brings valuable information to the researcher. 

The methodology employed can be adjusted in order to produce pseudo-distributions 
of d more peaked around the mode. The strategy based on simulated annealing (Muller, 
1999) can be explored for this purpose. The approach of Muller (1999) for optimal design 
choice has computational advantages in comparison with traditional procedures of opti¬ 
mization. The choice of a utility function based on predictive variance reduction is easily 
justified in Geostatistics. Under preferential sampling, the use of utility functions that 
directly depend on g seems to be more affected than those based in variance reductions. 

This methodology also has an expensive computational cost and alternative proce¬ 
dures can be used to deal with this problem, e.g. Simpson et al. (2011) that use the 
Integrated Nested Laplace Aproximation - INLA methods (Rue et al., 2009), the use of 
Predictive Process (Banerjee et ah, 2008), methods to approximate likelihoods (see Stein 
et ah, 2004; Puentes, 2007) and the use of sparse covariance matrices (Furrer et ah, 2006). 

The authors recognize the limitations in the simulation studies presented here. The 
complexity and the variety of situations arising from designing under preferential sam¬ 
pling lead us to focus on specific features rather than obtaining more general conclusions. 
Finally, the authors also invite researchers interested in reproducing the methodology 
presented in this paper to contact the authors in order to obtain more details about the 
computational implementation. 

Appendix A 

In the preferential model, the full conditional distributions of S, (3 and a are proportional 
to 


p{S\g,T ^,(;!),a,/3,x,y) cx exp 
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p(/3 \ cr (j), a, X, y) cx p(x | S, a, /3)p(/3) 


M 


oc exp < pS'n — Ae“ ^ exp(/?5'(a:i)) — 


^1 

2k I ’ 


M 


p{a\S,^j,,T ^,(7 /3, X, y) oc exp < na — Ae“ exp(/35'(xi))— 


a 


In the MCMC, these quantities can be updated in Metropolis steps assuming a Gaus¬ 
sian proposal distribution centred in the previous values sampled. Then, we accept the 
proposal with probability 

PS = exp|-^[^;P’'°P5f - S'ySy - 2{SP^°P - SyYiy - pi)] + - ^)'n| 

r ^ f Q/ p—1 C Qfprop p —1 Qprop\ 

\ Ae“g[exp(^5(x.)) - exp(/35(x.)^™^)] + 


X exp 


( ^ — / 92 prop\ 

PP = exp (/3P™P - /3)S'n + Ae“ ^(exp(/35(x.)) - exp(/3^’™^’5(x,))) + 


2k 

Pa = exp|(aP™P - a)n + (e“ - e“'’''“*’)A^exp(/35'(xi)) -b — -^ 

respectively. The vector n' = (ni,n 2 ,... jUm) represents the number of observations in 
each subregion, where X]i=i ~ 

In both models, the full conditional distribution of (j) is proportional to 

P{(l> I -S,/r, cr“^ Of,/3, y) oc exp ^ 

and this parameter can be updated in Metropolis steps assuming the following proposal 
distribution 

q{(j)P™P I (f>) ~ Lognormal (In(^) — 5/2; (5), 

where 5 must be chosen in order to produce reasonable acceptance rates in MCMC. 
Then, we accept the proposal with probability 




/^rop\ 

\~V ) 


O-cj) 


X exp< — 




2a2 


+ b4ci)-r^°p) 


(In </ - In (j)P^°P -b 5/2)2 _ ^rop _ ^ + 5/2)"^ 

25 


Appendix B 

To evaluate the integral in expression of M(d,0,yd) a discretization of the region D 
can be applied yielding M subregions, as described in Section 2. Then, we have that 
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u{d,9,yd) can be approximated by 


(d,6»,yd) = I 6»,y) - y(5'^ | 6»,y,yd)], 


where [5^ | 9, y] and [Si \ 9, y, yd], i = I,., M, are Gaussian with respective means 
CT^r„(T^/„ + cr^i?„)"^(y - 1/r) and + tT^i?n+„)“^(y* - 1^) 

and variances 

- cr^r„(r^/„ + and V^r„+m, 

where y* = (y,yd)'- 


Appendix C 

Following the procedure proposed in Section 4, we can make an approximation in p(x | 
S,a,P) to obtain an analytical expression of T^[S' | 0,y, x]. In particular, expanding 
the exponential function in Taylor series around zero, up to the second order term, we 
obtain the following approximation 

exp(a + pS{xi)) K, e“ Tim + /SIm'-S* + ^S'S 


By inserting this expression in p(x | S,a,P), it can be shown that the conditional 
distribution [S' j 9, y, x] becomes Gaussian with mean vector 0 and covariance matrix 
S given by 


„ „ / (yn - gn)T +/3n- A/3e“n \ 

0 = 2 j X and 

-A/3e“lN J 

(T-2 + A/32e“)/„ + i?-iS„,ArA-iSw,nS-i + (T-2s-i -R-^Rn,NA-^ V' 

"1^ -A-^RN,nRn^ A/l2e“/^ + A-i j ’ 

where A = g'^Rn — Rn ,nRn^Rn.N , and the vectors n and yn represent the number 
of observations and the total observed in each subregion of D, respectively. If /3 = 0, 
this matrix becomes equal to the kriging variance matrix traditionally obtained in Geo¬ 
statistics. 
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1 Introduction 

We would like to start by thanking the Editor of BA for the opportunity to discuss our 
work and the discussants Peter J. Diggle, Michael G. Ghipeta, James V. Zidek, Noel 
Cressie and Raymond L. Ghambers for a very thorough evaluation of our contribution 
and for their thoughts on the topic. Our rejoinder to the discussion will be presented 
in the following topics: Preferential Sampling; Auxiliary Information; Models for [Alls']; 
Utility Functions; Sequential Design; and Approximations. 


2 Preferential Sampling 

Preferential sampling plays an important role in surveys routinely carried out by Offi¬ 
cial Statistics agencies, as those developed in the Brazilian Institute of Geography and 
Statistics (IBGE), the institution that one of us is affiliated to. So we are well aware of 
the relevance of this issue. In addition to the areas of application of preferential sampling 
cited by the discussants, it is important to mention the very topical area of publication 
bias (see Bayarri and DeGroot, 1993; Franco et ah, 2014). 

The link between preferential sampling and the methodology of survey-sampling 
presented by discussants is also very helpful in clarifying the similarities of approaches. 
We agree with Gressie and Ghambers (hereafter GG) that papers from the latter may 
bring important aspects of sampling design to the context of Geostatistics. However, 
an important distinction between the approaches is needed. While in the context of 
survey methods the population size is generally fixed at a finite N, this feature is 
not true in the context of Geostatistics. In fact, in Geostatistics a finite value of N 
may only be associated with the discretization of a continuous process. Thus, part 
of the similarity between the approaches stems from the current limitation of many 
approaches to handle inference and prediction in Geostatistics appropriately due to the 
use of discrete approximations. 

We agree with Ghipeta and Diggle (hereafter GD) that preferential sampling is a 
method of adaptive design, which may depend on the previous design without relying 
on the underlying process S. Similarly, inference may be simplified after assuming that 
the process X is governed by the values of a spatially distributed covariate W, thus 
rendering conditional independence between X and S given W. The main difficulty is 
finding and quantifying such a covariate. We will return to this issue in the next section. 
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3 Auxiliary Information 

All discussants brought in the issue of replacement of some of the latent and unobserved 
random processes by observed quantities. These processes are a mere artifact to best 
represent our lack of knowledge about the true mechanism underlying the observation 
processes, if such truth exists. If we knew what variables cause our observational pro¬ 
cesses to behave as they do and were able to measure these quantities, we should use 
them. Otherwise, models that incorporate relevant features such as smoothness of these 
processes are a useful alternative. This issue is not only relevant to Spatial Statistics 
but to any other area of Statistics where absence of complete knowledge of the sources 
of variation forces the use of qualitative information in the form of latent processes, and 
Gaussian processes are one of the most adequate choices as a first step. 

We agree that covariates could (and should) be part of all model components. Co¬ 
variates may also be used to construct a deterministic intensity function of the point 
process X, or as proxy of a random process S in a model with a random intensity 
function. They may reduce remaining spatial heterogeneity in the mean or covariance 
structure of S, when available. CC suggested the use of other covariates Z, assumed to 
be highly correlated with the process S, as a proxy. This is a practical solution that needs 
to be used carefully. Despite simplifications produced in inference, it is important to 
bear in mind that this solution may also introduce another source of error in the model. 

However, despite being a natural choice, we do not always have available covariates. 
In other situations, the covariates are available only in part of the region of interest. 
In these cases, it is often necessary to interpolate values before including them in the 
model, adding more uncertainty to the results. 


4 Models for [X15] 

We agree that an appropriate model is crucial and robustness considerations are even 
more important here. Many discussants of Diggle et al. (2010) seemed to agree that, 
in practical situations, it is unlikely that the design is governed by a log-Gaussian 
Cox process. This issue is in line with our views expressed in the first paragraph of 
the previous section. In some cases, it may be enough to assume that the intensity of 
the point process is somehow proportional to the underlying process at an appropriate 
scale. In these cases, a log-linear intensity function may be seen as the hrst option for 
approximation and may thus be a good starting point to mitigate and understand the 
consequences of a preferential sample. 

In addition to methods for obtaining a robust design against selection bias, alterna¬ 
tive specifications for [Ails'] may also contribute to a satisfactory model-based inference. 
All discussants have correctly expressed concern about the sensitivity of the inference 
to the choice of the model for the intensity function. We concur with their concern and 
particularly welcome the use of non-parametric specifications to replace the parametric 
specification of our paper. This is bound to lead to a more flexible and hence robust 
alternative to the global linearity imposed by the a f3S predictor over the entire region 
of interest. 
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One approach following this path allows the regression coefficients’ processes a and /3 
to vary locally over space. Inference for these models may be performed in a simplified 
fashion, via discretization of coefficients over sub-regions (Pinto Junior et al., 2015), 
or exactly, by retaining the continuous variation of the infinite dimensional regression 
coefficient process (Gongalves and Gamerman, 2015). Another non-parametric approach 
was proposed by Kottas and Sanso (2007), based on mixtures of Dirichlet process priors. 
Their idea could be adapted to a prior for the intensity of a log-Gaussian Gox process 
that (instead of being concentrated on) is only centered at a -I- /3S, again allowing more 
flexibility for the intensity. 

Zidek also suggested an interesting situation where the generating process for the 
locations is based on another process S*. In this case, it is first necessary to assess 
the degree of dependence between these two processes, that is, to evaluate if [S', S*] = 
[S][S*]. If so, the researcher can consider the points X as ancillary for S and proceed 
with the non-preferential, standard inference as usual. Otherwise, it may be necessary 
to establish some form of dependence for [SjS*] to be able to proceed with inference. 
A bivariate specification for [S, S*] may be one way forward (see Gamerman et al., 2007; 
Grainiceanu et al., 2008). 

In conclusion, our model for [AjS] seems to be able to highlight the effects of prefer¬ 
ential sampling in the absence of a better understanding of the true causes of variation 
or of relevant covariates, but more flexible forms are needed. 


5 Utility Functions 

We agree with GG’s suggestions about notation to improve the paper understanding. 
In particular, we consider pertinent the suggestion of making Sd explicit in the utility 
function. 

The choice of a particular utility function is a crucial step to obtaining the optimal 
design. For this reason, we completely agree with GG that the questions how much? 
and why? cannot be set aside while the researcher is planning a new location sample. 

Utility functions based on predictive variance reductions have long been recognized 
as appropriate to measure improvements in predictive accuracy. Zidek noted that more 
general evaluation (of predictive/estimation performance) may require the use of other 
utility functions. We already presented at least one alternative formulation early in the 
paper to emphasize our adherence to this point and to detach ourselves from the initial 
approaches based only on a single function, evaluating predictive errors. Our methods 
apply equally well to any quantifiable utility function. 

The combination of different goals — e.g., reducing predictive errors, reducing un¬ 
certainty with respect to S', identifying thresholds, reducing estimation errors and eval¬ 
uating costs involved — allows the researcher to obtain designs in complex situations. 
Examples of complex utility functions with competing goals can be found in the re¬ 
cent work of Muller et al. (2004), Ruiz-Gardenas et al. (2012) and Ferreira (2015) to 
name a few. Note that these goals may be competing, e.g., reduction of the nugget 
effect estimation error assigns more utility to regions close to locations already sampled 
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whereas reduction of predictive error assigns more utility for regions far from the points 
already sampled. The additional challenge in this case is to weigh the various objectives 
involved. 

Alternatively, the use of entropy in the utility function is usually recommended 
for situations where multiple goals are involved (Caselton and Zidek, 1984) but this 
is not an easily implementable solution in practice. We do believe that in practical 
situations the utility function needs to be specified by whoever is in charge of the 
analysis, be it a regulatory body, a government institution or a researcher, with the 
help of a statistician. The responsible entity must be able to value the worth of the 
information each component of the utility provides. If one can answer, for example, 
why predictive variance reduction is relevant, one must be able to assign its monetary 
value; otherwise, one must reassess the relevance of each component included in the 
utility function. This value is more easily combined with readily quantihable monetary 
components such as cost associated with each new location. An economist specialized 
in the area of the study may be a useful addition to the team setting up this enterprise 
at this stage. 


6 Sequential Design 

The approach we used for design (Muller, 1999) allows the planning of m new sampling 
sites according to the utility function defined by the researcher. This design plan can be 
sequential or not, while the underlying process S can be static or dynamic in time. In 
cases where S' is a dynamic process, planning of a sequential design scheme is a natural 
choice. However, in the case where S is fixed in time, one can also plan a sequential 
design, although it is not possible to ensure that the resulting design will be optimal. 
Actually, a sequential sample design can be a simple and cheap solution, especially in 
situations where the costs are not fixed and can increase before obtaining the desired 
new sample locations. 

We anticipate difficulties to assigning a distribution for [^15, x] without reducing the 
flexibility of the model, when a sequential-sampling-design strategy to update 0 and S 
through [d|x, y] is built. It would be necessary to assess whether the initial sample is 
preferential in cases suggested by CC with a pilot study. If it is possible to assume that 
the pilot phase is non-preferential, then this information will be ancillary to inference. 
On the other hand, if the pilot sample is preferential, then it is possible to incorporate 
this information by performing modifications in [A'jS']. Note that this may remove the 
Poissonity of the model for [A[S'], due to repulsions that may occur around previously 
selected locations. This will imply more difficulty for likelihood approaches and opens 
up for interesting research questions. 

The case of a dynamic underlying process is more complex. We visualize the following 
generalization of the 2-stage set-up proposed by CD. In cases where samples are taken 
at different times in a multi-stage scenario, we may have 

[F,X,A] 

t 
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where Za-.b = {Za, Za+i, ■ ■ ■, Zb-i, Zb), for a < 6 integers, and dependence on hyperpa- 
rameters was removed from the notation as in CD. Conditional independence between 
observations given the corresponding underlying processes may be assumed in some 
cases leading to [Yt\Yi.,t-i,Xi,t, St] = [Yt\Xt,St]. Similarly, [Xt\Xi.,t-i, Si,t] = 
may be assumed for some sampling schemes, although the general formulation may be 
required in some cases (see the paragraph above). Further simplification such as those 
suggested by CD may be assumed, depending on the situation. Ferreira (2015) worked 
on a similar structure, simplified by his non-preferential sampling scheme. 

In the practical situation presented by CD, where the sampling order is a crucial 
factor, a simple utility function based on predictive variance reductions or exceedances 
probably would not be enough to produce a satisfactory sample design. In challenging 
situations like this, it would be necessary to choose more complex forms to reward each 
sample unit, at each time, in a sequential sampling scheme. 


7 Approximations 

Questioning the stationarity assumption is a mandatory task in any Spatial Statistics 
problem. Stationarity can always be seen as an approximation to a more complex un¬ 
derlying dependence. But it turns out that in many practical situations it has proved to 
be a reasonable, viable solution. Alternatively, approaches based on convolution process 
(Higdon, 2002) or the use of more flexible (non-parametric) structures to enhance the 
spatial dependence structure of S can be used. Again, covariates are always relevant 
options to handle the large-scale heterogeneity considered by CD. Obviously, an increase 
in the complexity of the model can also complicate the evaluation of the utility function 
used to obtain the optimal design and parsimony may have to be called into action. 

We agree with Zidek that other approximation of ^(Alx, y) could be used in this 
step. A simpler, but expensive, alternative is to generate sub-chains in order to estimate 
this quantity during MCMC. Alternative analytical approaches may be preferred to 
avoid the increasing computational cost. It is important to emphasize that 

• The approximation was only used to evaluate the utility function, since an analyt¬ 
ical expression for V (5'|x, y) is not available; the values of S were always sampled 
from the correct distribution [5'|y,x]; 

• Other utility functions, as those used in our case study, may not require any 
approximation. 
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